Chip 1997 March

home *** CD-ROM | disk | FTP | other *** search

/ Chip 1997 March / CHIP Mart 1997.iso / SesProg / VDIGIT.ZIP / VOICEKIT.DOC < prev next >

Wrap

Text File | 1989-06-24 | 30.3 KB | 615 lines

Digitized Voice Programmer's Toolkit for the PC ----------------------------------------------- Version 1.0 Copyright (c) 1988,1989, Farpoint Software * * * * * * * * * * * * * * * * * ************************************************************************** * * * To those of you who have HIDI.ARC and/or DIGITS.ARC, welcome back. * * This new release will serve as a major upgrade to things you already * * have. * * * ************************************************************************** Introduction ------------ This toolkit is a combination of software and hardware designed for the purpose of mechanizing and simplifying the process by which programmers may create digitized voice recordings, store them on disk, edit the voice data files, and incorporate digitized voice playback into their own high-level language programs. The recording of digitized voice requires a small, inexpensive hardware device to be built. Schematics and printed circuit board layout files are provided for this device. Playback of the digitized voice, however, requires NO SPECIAL HARDWARE. The sound is produced with the built-in speaker provided in nearly all PC's and PC-compatible machines. This means that programs may be written for general distribution which will play voice messages on the user's machine as it exists. Here is a list of the major features of the current software package: (1) Operates under the DOS environment. (2) Provides a full set of voice record/playback control routines which are directly callable from many high-level languages including C and Pascal. They are also of course callable from assembly language. (3) All voice operations proceed IN THE BACKGROUND. The control routines return to the caller immediately, and voice playback occurs simultaneously with the continuing execution of the main program. The main program may call a status routine at any time to check on the progress of the voice playback. (4) There are no length limitations on either the size of the memory buffers or the size of the voice data files on disk other than the physical limits of the machine itself. 64k is not a special number. (5) A sophisticated voice data file editor is provided. This gives the programmer a set of capabilities similar to those available on a conventional tape recorder. Position markers, live overwriting, selective erasure, cut-and-paste, and assorted other features make the produciton of "refined" voice files an easy task. (6) Several short example programs are included, written in both C and assembly language, which demonstrate the use of the calls to the voice modules. There is even an example of a memory-resident program which detects the pressing of the left shift key and plays a short voice message when this occurs. (Foreground processing continues undisturbed.) Shareware Notice ---------------- The Digitized Voice Programmer's Toolkit is released as Shareware. This is copyrighted material; it is NOT "free software". You are permitted to experiment with this package long enough to determine if it suits your needs, but if you will be making use of the material in your own programs, then a license fee of $50 is required. NO PROGRAM WHICH MAKES USE OF THE MATERIALS IN THIS TOOLKIT MAY BE SOLD COMMERCIALLY OR ON A CONTRACT BASIS UNLESS THE SELLER HAS PAID THE LICENSE FEE. Please make the check or money order payable to: Farpoint Software 2501 Afton Court League City, Texas 77573 For convenience, a registration form is included in the file REGISTER.FRM. As a registered user, you will receive updates automatically long before they are released to BBS's. You will also receive a copy of the source code to the VDFE editor. Registered users, of course, are given higher priority if programming assistance or hardware construction assistance is requested. You are granted permission to distribute copies of the Digitized Voice Programmer's Toolkit, provided that (1) no fee is charged for such copies, other that a nominal disk duplication fee, (2) these files are distributed in their original, unmodified form, and (3) ALL the files in the original archive are included with each copy. (See "List of Files" below.) If you paid a "disk duplication fee" or other such fee to a distributor of public domain and shareware programs, be aware that the payment of this fee DOES NOT constitute registration of this Toolkit. Likewise, the payment of a fee to any Bulletin Board Service for the time required to download this Toolkit DOES NOT constitute registration. Registration occurs only through direct interaction with Farpoint Software. If more information is needed, write or contact Alan D. Jones through Compuserve Information Service at user ID 74030,554. List of Files ------------- The files included with the Digitized Voice Programmer's Toolkit are: BIN2ASM BIN2ASM.C BIN2ASM.EXE EMBEDDED EMBEDDED.C EMBEDDED.EXE EVM.PRE EVM.SUF EVM.VOI LONGTEST.VOI README.1ST REGISTER.FRM RUN_ME.BAT TSR TSR.ASM TSR.EXE TSRVM.PRE TSRVM.SUF TSRVM.VOI VDFE.EXE VMSCH.HPP VOICEKIT.DOC VPMOD.ASM VPMOD.DOC VPMOD.H VPMOD.OBJ VPTEST VPTEST.C VPTEST.EXE VRMOD.ASM VRMOD.DOC VRMOD.H VRMOD.OBJ VRTEST VRTEST.C VRTEST.EXE If you received the Toolkit with any of the above files missing, please notify Farpoint Software. Description of Voice Subroutine Modules --------------------------------------- The key software elements in the kit are two assembly language programs, VRMOD.ASM and VPMOD.ASM, and their assembled OBJ files. These are not stand- alone programs. They are designed to be linked with other programs to provide the voice control routines. The calls associated with recording are in VRMOD, and the calls associated with playback are in VPMOD. Any given program may be linked with either or both of these modules. Typically, a program designed for general distribution would be linked only with VPMOD, since recording requires the hardware device. The external hooks to the two modules consist of various "public" procedure names. All procedures use the Pascal calling convention, since most high-level language compilers can support this calling method. The Pascal calling convention has the following meaning: (1) Procedure names are all caps, and are not preceeded by an underscore. (2) Procedures are called with "far" (intersegment) calls. (3) Short return values appear in the AX register; long return values appear in DX:AX. (4) Parameters are pushed onto the stack in left-to-right order; i.e. the first parameter in the list is pushed first. If the parameter is a doubleword, then the high order word is pushed first. (5) The called subroutine is responsible for clearing the parameters from the stack upon return. The above list will be of interest primarily to assembly language programmers. When working in a high-level language, it is necessary only to make sure that the compiler is using the proper calling method. For C programs, two header files have been included. They are VRMOD.H and VPMOD.H. At the beginning of any C program which is to use the voice playback routines, insert the line: #include "vpmod.h" This file contains prototypes of all procedure calls in VPMOD.ASM, declared in a way that causes the compiler to generate correct calling code. The details of how each individual procedure call operates will be found in the separate documents VRMOD.DOC and VPMOD.DOC. It is suggested that you print these files for use as reference material while writing programs. It is possible to link both VRMOD.OBJ and VPMOD.OBJ to the same program, but you should NOT have both packages initialized at the same time. Each package assumes "ownership" of timer channel zero, and this would cause a conflict over the setting of the hardware timer interval, not to mention the problem of possible insufficient CPU time to execute both interrupt routines at every timer tick (at 16500 Hz). The solution here is (1) never attempt to record and play back at the same time, and (2) don't call PVOICE_INIT until playback is ready to begin and be sure to call PVOICE_CLEANUP immediately after playback ends. (Similar rules apply to recording.) Example Programs ---------------- Note: "Make" files acceptable to Microsoft's Make utility are included for all the example programs. The compiler used was the Microsoft C Compiler version 5.10. The assembler was the Microsoft Macro Assembler version 5.10. The make files are written to assume that the compiler is installed to include the Large model library and that the default operating system is DOS. If the compiler defaults to the OS/2 operating system, then change the make files so that all occurrences of "llibce" become "llibcer". VRTEST.C (VRTEST.EXE): [Related files: VRTEST] This program works like RECORD.COM provided with the first voice digitization package released in 1988. It demonstrates the use of all the procedure calls and features in VRMOD. To execute the program, first attach the voice recording circuit to a COM port, then at the DOS prompt type: VRTEST 1 TESTFILE.VOI. If you are using COM2, then substitute "2" for the "1". The filename "TESTFILE.VOI" may be any filename. Recording will begin and messages will scroll on the screen indicating the number of bytes of data recorded. Writing to the file will be performed "on the fly". The memory buffer size is currenly set to 16k, but may be changed by editing and recompiling the program. Recording will continue until either the <Esc> key is pressed or the disk is full. The size of the memory buffer should be at least 8k, but beyond this point it is actually irrelevant as long as calls to RVOICE_CATCHUP are made frequently enough (which means at least once every 3 seconds). VPTEST.C (VPTEST.EXE): [Related files: VPTEST] This is the counterpart to VRTEST. It demonstrates the use of all the procedure calls in VPMOD. As in VRTEST, the memory buffer is currently 16k but may be changed by editing and recompiling. The command line to execute the program is VPTEST TESTFILE.VOI, where "TESTFILE.VOI" is the name of a file containing voice data. The reading of the file will occur as needed to keep the buffer full or until all bytes have been read. The size of the memory buffer needs to be increased beyond 8K only if it is not possible to call PVOICE_CATCHUP at least once every 3 seconds. (Note that it may also be advisable to increase the buffer size if the file is being read from a floppy disk, since accesses may be quite slow.) EMBEDDED.C (EMBEDDED.EXE): [Related files: EMBEDDED, EVM.VOI, EVM.PRE, EVM.SUF] This is a simple example of the techniques used to embed voice data in an executable program. Instead of reading a separate voice file, the voice data is part of the EXE file. Note that the "make" file in this case is as important to study as the C program. The trick here is to convert the raw binary voice data file into an OBJ file that we can feed through the linker. This is done in three stages: (1) The file-cruncher program BIN2ASM is used to create a file containing only a long list of assembly language DB statements equivalent to the binary data; (2) The prefix file EVM.PRE and the suffix file EVM.SUF are combined with the DB statements to form an assembly language module containing all necessary segment brackets and public declarations; (3) This module is assembled and linked with the main program. The content of the prefix and suffix files depend on the specific application; in this example we use only a single segment and a single block of voice data. A more complex program may contain several modules of this type or have an assortment of labels within a single module. Since the assembler requires segments to be 64k or less, BIN2ASM places a marker comment (a semicolon and a string of minus signs) at each 64k boundary in its output file. If this happens, you must edit the file to end a segment and begin a new one at each of these boundaries. TSR.ASM (TSR.EXE): [Related files: TSR, TSRVM.VOI, TSRVM.PRE, TSRVM.SUF] This serves as both an example of a pure assembly language program using VPMOD and a technique for including voice playback in a memory-resident program. The voice data is embedded in the EXE file in the same way as it was done in EMBEDDED.EXE above. Otherwise, the program is fairly conventional. There is one major caution to observe, however: since a memory-resident program may play voice concurrently with the execution of another unknown program, don't set the file read flag (in PVOICE_START) to 1 and don't use PVOICE_CATCHUP! Use of the "read-on-the-fly" feature of the voice control routines calls DOS to read the disk. If a DOS call is made within an interrupt service routine (especially a timer tick routine), the interrupt may have occurred while a DOS call was already in progress. In this case, DOS will be "re-entered", and it is NOT re-entrant. Doing this will almost certainly cause a system crash. If you are already familiar with the above problem, and have worked out a system of calling DOS in the background during its "safe" moments, then you probably will be able to use read-on-the-fly. Always call PVOICE_START, PVOICE_INIT, PVOICE_CLEANUP, and PVOICE_CATCHUP during "safe" times. Also, remember that timer interrupts will now be happening at about 16500 Hz, so make sure that your program never disables interrupts for more that a very short time. (One more thing: if you must hook INT 8, do it BEFORE calling PVOICE_INIT.) The Voice Data File Editor (VDFE) --------------------------------- This program provides a convenient environment for creating, editing, and generally patching together voice data files. Its function resembles that of a tape recorder. It edits files only within its RAM buffer, which is limited by the amount of memory on the machine available to DOS. On a 640k machine, this translates to about 470k of buffer space, or 225 seconds (3 minutes and 45 seconds) of continuous sound. If you need to edit nonstop chunks of voice data longer than that, they will have to be edited piecemeal and concatenated afterward. (Of course, multi-megabyte voice data files may be recorded using VRTEST or a similar program. If it turns out that people really need to edit super-long files on a regular basis, I will include infinite-file-length editing on a future release.) VDFE requires no command line parameters. Upon execution, it displays its primary screen and waits for user input. This consists primarily of single keystroke commands, which are hereby documented in some detail: <Up arrow> and <Down arrow>: These are used to scroll the contents of the Operating Instructions window in the lower right area of the screen. The window displays one-line descriptions of all the keystroke commands. <F1>: Displays an information screen which briefly describes the purpose and operation of VDFE. <Esc>: Exits to DOS. If the contents of the editing buffer have been altered since the last save to disk, the user is asked to confirm the exit command. <F2>: Increments the COM port number shown at the left side of the screen. This will be the port used for recording. Press <F2> repeatedly until the desired port number shows. <F3>: Requests a file name, then loads the file into the edit buffer starting at offset zero. The end-of-file position will be set to match the length of the file. If the specified file does not exist, the user will be asked whether to create the file. If the answer is "yes", then a zero-length file is created and the end-of-file position is set to zero. The actual data in the edit buffer remains unchanged. <Alt F3>: Requests the entry of a new file name. This becomes the current file name as shown at the left side of the screen. Nothing is done with this name immediately. The new file name will be used in subsequent "save current data" (<F4>)operations. <F4>: Saves current data. The current filename is opened and truncated, and the contents of the edit buffer from offset zero to the offset shown as the end-of-file are written to the file. <Space bar>: The "Stop" button. If a record or playback operation is in progress, it is stopped. <Enter>: The "play" button. The contents of the edit buffer are played back through the speaker starting from the current position. Playback ends at the end-of-file position. If the current position is greater than or equal to the end-of-file position, playback will not occur. <Insert>: The "record" button. Digitized voice is input through the selected COM port and written into the edit buffer. Writing begins at the current position, overwriting existing data. Recording can be stopped by pressing <Space>, <Enter>, or any key which normally has the function of changing the current position. If the current position during recording exceeds the end-of-file position, then the end-of-file position is moved forward continuously to match the current position. If the current position reaches the end of the edit buffer, then wrap-around will occur, causing recording to continue at offset zero. <Left arrow>: Medium-speed rewind. The current position will be decremented by 256, which corresponds to about 1/8 second of voice time. <Right arrow>: Medium-speed forward. The current position will be incrememted by 256. <Ctrl left arrow>: Fine rewind. The current position will be decremented by 1 byte. <Ctrl right arrow>: Fine forward. The current position will be incremented by 1 byte. <Page Up>: High-speed rewind. The current position will be decremented by 8192, which corresponds to about 4 seconds of voice time. <Page Down>: High-speed forward. The current position will be incremented by 8192. <Home>: The current position is set to zero. <End>: The current position is set to match the end-of-file position. <Ctrl end>: The end-of-file position is set to match the current position. <0> through <9>: Set marker. There are 10 markers, numbered 0 through 9. Each marker consists of a slot in which a "current position" may be stored. Any time a digit key is pressed, regardless of the stopped/playing/recording state, the current position at that instant is copied into the corresponding marker. The marker values are displayed in a window in the lower left area of the screen. <Alt 0> through <Alt 9>: Pressing a digit key (on the main section of the keyboard, NOT the numeric keypad) while holding the <Alt> key down causes the current position to change to match the value stored in the corresponding marker. <F5> and <F6>: Change the marker numbers which are assigned the "begin" and "end" flags. In the left column of the marker window, two of the marker number positions always contain 'beg' and 'end' rather than a digit. These are the ones used in any operation that refers to a "marked section". Initially, marker 0 is the "begin" marker and marker 1 is the "end". Press <F5> repeatedly to move the 'beg' to the desired marker. Press <F6> repeatedly to move the 'end' to the desired marker. The two flags are not allowed to be assigned to the same marker. <Tab>: Sets the current position to match the "begin" marker and initiates a playback operation which will terminate at the "end" marker. <F7>: A filename is requested from the user. The contents of the marked section of the edit buffer are written to this file. If the file already exists, it will be overwritten. The current filename remains unchanged. <F8>: A filename is requested from the user. The contents of this file are copied into the edit buffer starting at the "begin" marker. The "end" marker is changed to reflect the size of the file. The current filename remains unchanged. <F9>: The marked section will be erased (filled with zeros). <F10>: This causes the editor to enter a mode in which text may be typed into the column of the marker window titled "comments". These are simply reference notes and have no effect on the operation of the editor. The comment entry mode is exited by pressing the <Esc> key. Graphical Print Files --------------------- These files are prepared for output to an HP LaserJet Plus printer with the minimum memory configuration (512k). To print one of the files, use "COPY /B <filename> LPT1:" (or LPT2 if appropriate). The following lists the contents of each file: Filename Density Description -------- ------- ----------- VMSCH.HPP 150 dpi The schematic to the Digitizer. VMPCB.125 300 dpi A positive print of the "copper side" of a single-sided circuit board implementing the Digitizer, suitable for photo-reduction to board manufacturing negatives. Scale is 1.250, producing the largest image that will fit in the LaserJet 512k memory. VMSLK.125 300 dpi A positive print of a silkscreen component placement guide for the component side of the board. This may be either silkscreened onto the board or simply printed out and referred to while building the board. Scale is 1.250. VMDRL.125 300 dpi A drilling guide for use in making numeric- control tool tapes with a digitizing pad. This print will not be of much use to those who will be drilling the holes by hand. Scale is 1.250. VMPCB.100 300 dpi A duplicate of VMPCB.125, but scaled 1:1 for use with contact-print or direct transfer methods of producing the negatives. VMSLK.100 300 dpi A duplicate of VMSLK.125, scaled 1:1. VMDRL.100 300 dpi A duplicate of VMDRL.125, scaled 1:1. Due to the large size of the printed circuit board files, and the probability that most users will not actually want to manufacture a board for this device, these files are placed in a separate archive. Only the schematic, VMSCH.HPP, is included in this archive. All of these plots are available to registered users formatted for output on a variety of other printers and pen plotters (photoplotters also). Contact Farpoint Software at the address / CIS number shown in the Shareware Notice section of this document. Schematic Notes --------------- The circuit is designed to operate from two 9-volt batteries connected to J1 and J2. The original circuit used a single-ended supply. This modification requires fewer parts and produces the correct RS-232 voltages at the output. Pad resistors have been added to the trimpot. This control in the original version was somewhat difficult to adjust. The pad resistors decrease the sensitivity of this control enough to allow a 1-turn potentiometer to be used, thus reducing the length of the "hunt" for the proper position. If your serial port uses a DB-9 connector, the cable from J4 is: J4 pin 1 -------- DB-9 pin 5 (Ground) J4 pin 2 -------- DB-9 pin 8 (CTS) If your serial port uses a DB-25 connector, the cable from J4 is: J4 pin 1 -------- DB-25 pin 7 (Ground) J4 pin 2 -------- DB-25 pin 5 (CTS) The circuit consists of two stages of voltage amplification with some high-pass filtering built into the coupling capacitors, followed by a differentiator. The output of the differentiator is fed to a voltage comparator, thus producing an output which has approximately the following relationship to the input from the microphone: If the derivative of the speech waveform if positive, then the output is logic zero; If the derivative of the speech waveform is negative, then the output is logic one. The transition timing at the output is entirely analog in nature; there is no synchronizing clock signal anywhere in the circuit. If the output of this circuit is connected directly to a speaker, the resulting sound will still be an understandable version of the input. Since the output consists of nothing but a digital bit stream, the job of the computer becomes that of simply recording and accurately reproducing this bit stream. The trimpot at the input of amplifier U3 is used to set the DC idle voltage output from the differentiator to somewhere near the threshold of comparator U4. There will be a considerable amount of noise at the output of U3, originating at the microphone and within the input circuitry of U1, and highly amplified by U1 and U2. The trimpot should be adjusted so that the comparator threshold is just outside the normal excursion of the noise signal ("off to one side"), otherwise "silence" at the microphone will become, at the speaker output from the computer, a loud hiss with a strong component at half the sampling frequency. I used LF356's for U1, U2, and U3, and an LM393 for U4. All amplifiers should have power supply bypass capacitors (not shown). The microphone is a 600 ohm dynamic type. The +-12 volt power supply should be quiet and well-regulated; the one in the PC is too noisy unless you use heavy filtering. Power supply bypassing consists of attaching capacitors in the 0.1 uF range (up to 1 uF is ok) DIRECTLY across the power supply pins of each amplifier chip. Layout is important here. The capacitors should use the shortest possible wire length to the pins of the chips. There will be 8 caps required: one from +12 to ground and one from -12 to ground for each chip. If you use dual or quad amplifier chips instead of the LF356's, then of course only one set of caps is required per actual chip. The purpose of the bypass caps is to provide a highly localized low-impedance power source at each chip to prevent unwanted positive feedback through the power leads (feedback between different chips). Comments on the Digitization Technique -------------------------------------- The speaker on the PC and its associated driver circuitry is quite simple and crude, having been designed primarily for creating single square-wave tones of various audio frequencies. This speaker is typically driven by a pair of transistors used as current amplifier which is in turn driven directly by the output of a TTL gate. This results in only two possibilities of voltage across the voice coil: 0 volts and 5 volts. Any sound to be reproduced by this system must be reduced to an approximation in the form of a stream of constant-amplitude, variable-width rectangular pulses. Examination of a speech waveform on an oscilloscope display quickly tells us that it is not going to be possible to even remotely mimic this waveform under the above restrictions. Much of the information contained in the waveform is in the form of amplitude variations, and this is the one attribute we cannot reproduce. It is initially tempting to try to use the technique of the "class D" amplifier to create the waveform, using high-speed pulse width modulation and depending on the mechanical characteristics of the speaker and those of the human ear to provide the missing low-pass filtering. Assuming the sampling rate to be 8 KHz (based on the Nyquist criterion) and, to conserve memory, assuming the samples to contain only 4 bits of amplitude information (16 levels), we can see that data accumulates at a rate of 4k bytes per second, which is certainly acceptable. The problem comes when we try to play back the sound. Pulses occur at intervals of 125 microseconds, which doesn't seem too bad, but since each pulse can have 16 possible widths, it is necessary to time the pulses with a resolution of well under 8 microseconds. This is only a couple of instruction times on a 4.77 MHz XT, and even on a fast 80386 it doesn't give the CPU much time between bits to shift bits, read and increment a pointer, check the pointer to see if it's done yet, etc., not to mention the difficulty of servicing unrelated interrupts. The search for simpler (but still usable) and less CPU-intensive methods of reproducing speech leads to the question of what information in the waveform we can discard without an unacceptable loss of intelligibility. My experiments with running speech signals through a graphic equalizer revealed that the lower-frequency components, those which are most visible to the eye on the oscilloscope, are actually of minimal importance in understanding speech. This is also demonstrated by the fact that a whisper is just as understandable as normal speech, but does not make use of vibrating vocal chords, which are the primary source of low-frequency components in the voice. The present digitizer circuit makes use of this observation by filtering out most of the low-frequency components of the sound signal. Knowing that the speaker cone cannot move instantaneously and serves as an approximation to a mechanical integrator at high audio frequencies leads to the idea of differentiating the input waveform. This accomplishes the following result: the direction of movement of the speaker cone corresponds to the direction of movement (derivative) of the waveform. Amplitude information is lost. As it turns out, this is sufficiently understandable to be worth pursuing.